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(57) Abstract: A vector permutation system (100) for a single-instruction multiple -data microprocessor has a set of vector registers 
(1 10) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated 
according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) 
selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or 
need to be executed, and no permutation parameters need to be stored in the vector registers (1 10). This leads to higher performance, 
a smaller vector registers file and hence a smaller size of the microprocessor and better program code density. 



WO 2004/038598 Al 111 III II 111 I III II Hill III II III llllllllllllllllllllllllll lllllllllllllllllll 



For two -letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



WO 2004/038598 



PCT/EP2003/011176 



- 1 - 

ARRANGEMENT, SYSTEM AND METHOD FOR VECTOR PERMUTATION IN 
SINGLE- INSTRUCT ION MULTIPLE-DATA MICROPROCESSORS 

5 Field of the Invention 

This invention relates to microprocessors with Single- 
Instruction Multiple-Data (SIMD) capability. 

10 

Background of the Invention 

In the field of this invention microprocessors with SIMD 
architecture are arranged to process vector operands. It 

15 is known to provide instructions that permute (rearrange 
the order of) the components of vector operands in order 
to improve the efficiency of digital signal processing 
algorithms on SIMD microprocessors. Permutation 
parameters are required to determine the characteristics 

20 of the permutation to be performed. 

However, this approach has the disadvantage { s ) that if 
the vector permutation requires extra instructions, 
performance decreases. If the permutation parameters 
25 and/or the permuted vector operand require extra 

registers in the microprocessor's vector register file, a 
large register file is required. This increases the 
microprocessor's size and has a negative impact on 
program code density. 



30 
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A need therefore exists for an arrangement, system and 
method for vector permutation in SIMD microprocessors 
wherein the abovementioned disadvantage ( s ) may be 
alleviated. 

Statement: of Invention 



In accordance with a first 
10 is provided an arrangement 
microprocessors as claimed 



aspect of the invention there 
for vector permutation in SIMD 
in claim 1. 



In accordance with a first aspect of the invention there 
is provided a system for vector permutation in SIMD 
15 microprocessors as claimed in claim 2. 

In accordance with a third aspect of the invention there 
is provided a method for vector permutation in SIMD 
microprocessors as claimed in claim 5. 

20 

The arrangement preferably further includes a negate 
block coupled to the control means and coupled to receive 
and selectively negate vectors from the permutation logic 
block according to the control parameters received from 
25 the control means, wherein the control parameters include 
permutation parameters and negate parameters. 

Preferably the control means includes at least one 
counter arranged to provide a sequential order for 
30 selecting one of the plurality of control registers. 
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The control register parameters are preferably also used 
for determining negate characteristics and the step of 
permutating further includes the step of selectively 
5 negating the vectors according to the parameters of the 
selected control register. Preferably the step of 
selecting further includes the following of a sequential 
order of the plurality of control registers. 

10 Preferably the sequential order includes automatic 

sequencing through a set of fixed control parameters. 
Alternatively the sequential order preferably includes 
automatic sequencing through a set of programmable 
control parameters. The sequential order is preferably 

15 cyclical. 

In this way an arrangement, system and method for vector 
permutation in SIMD microprocessors is provided in which 
no separate permutation instructions are necessary or 
20 need to be executed, and no permutation parameters need 
be stored in the vector registers. This leads to higher 
performance, a smaller vector register file and hence a 
smaller size of the microprocessor and better program 
code density. 

25 

Brief Description of the Drawings 

One arrangement, system and method for vector permutation 
30 in SIMD microprocessors incorporating the present 
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invention will now be described, by way of example only, 
with reference to the accompanying drawings, in which: 

FIG. 1 shows a block schematic diagram of a known 
5 microprocessor with SIMD architecture; and 

FIG. 2 shows a block schematic diagram of a 
microprocessor system with SIMD architecture 
incorporating the present invention. 

10 

Description of Preferred Embodiment: ( s ) 

Within the field of SIMD architecture, it is known that 
15 permutation and optional negate operations of vector 

operands may be performed as side operations of certain 
instructions and do not themselves require separate 
instructions or execution cycles. 

20 However, programmers need control over when and how such 
permutations are performed. In order to control when 
permutations are performed, qualifiers are needed. These 
qualifiers may be: 

- enable/disable mechanisms 
25 - vector register numbers 

- instruction types 

- other 

In order to control how permutations are performed, 
30 permutation parameters, source/destination operands or 
optional negate operations are needed. Such permutation 
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parameters can either be fixed (hard-wired for specific 
algorithms) or programmable (stored in registers) . 

Referring now to FIG . 1, there is shown a prior art 
5 microprocessor 5 with SIMD architecture. A vector 
register file 10 of the microprocessor feeds vector 
operands into a permutation logic block 20. The vector 
register file 10 has a predetermined number of registers. 
The number of general purpose and/or vector registers in 

10 modern Reduced Instruction Set Chip (RISC) machines 

typically is an integer to the power of 2 with 8/16/32/64 
being the most common numbers . In the example depicted in 
FIG. 1, there are 32 128-bit registers, each register 
having four 32-bit elements. The last register (register 

15 15) is used to store control parameters for controlling 
the permutation logic block 20, as depicted by arrow 17. 

Referring now to FIG. 2, there is shown a microprocessor 
100 with SIMD architecture. A read port of a vector 

20 register file 110 feeds vector operands into a 

permutation logic block 120 and from there into a negate 
logic block 130. The vector register file 110 has a 
predetermined number of registers. In the example 
depicted in FIG. 2, there are 8 128-bit registers (of 

25 which 5 are shown) , each register having four 32-bit 
elements . 

The output is typically used as source operand for a 
vector Arithmetic Logic Unit (ALU) (not shown) . 

30 



WO 2004/038598 



PCT/EP2003/011176 



Permutation and negate parameters relating to 
permutations to be performed upon the vectors of the 
vector register file 110 are stored as control parameters 
in a series of control registers 140, A control block 145 
5 is coupled to each of the series of control registers 140 
and is further coupled to provide the control parameters 
therefrom to control the permutation logic block 120 and 
the optional negate logic block 130. A counter 150 is 
also coupled to the control block 145, the counter 150 
10 being arranged to determine which of the series of 

control registers is coupled via the control block 145 to 
the permutation logic block 120 and the optional negate 
logic block 130 at any one time. 

15 In operation, the microprocessor 100 will commence with 
the counter 150 pointing at a given control register of 
the series 140, such as a first control register 141. 
When a permutation is to be performed (all qualifiers 
true) , the control parameters (permutation and negate 

20 parameters) stored in the first control register 141 are 
provided via the control block 145 to the permutation 
logic block 120 and to the optional negate logic block 
130. The contents of the vector register file 110 are 
then processed by the permutation logic block 120 and the 

25 optional negate logic block 130 according to these 

control parameters. It will be noted that the optional 
negate logic block 130, being optional, may or may not 
perform a negate function on the contents of the vector 
register file 110, depending upon the received control 

30 parameters. 
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Once processed, the output vector source operand is sent 
to the ALU (not shown) and the counter 150 is 
incremented. This causes the control block 145 to select 
the next control register of the series 140 (such as the 
5 second control register 142) for the next permutation. 
The counter 150 is arranged to cycle through each of the 
series of control registers 140 in a repeating manner. 

It will be understood that the an arrangement, system and 
10 method for vector permutation in SIMD microprocessors 
described above provides the following advantages: 
No extra instructions are required to permute/negate the 
components of vector operands, leading to higher 
performance. Furthermore, no further registers of the 
15 vector register file are required to store the 

permuted/negated vector operands and the permutation 
parameters. It should be noted that even with 
programmable permutation parameters, the control 
registers 140 of FIG. 2 are significantly smaller than 
20 the vector register 15 of FIG. 1. Since the 

microprocessor's register file is smaller, this leads to 
a smaller size of the microprocessor and better program 
code density (fewer bits in op-codes for vector register 
addressing) . 

25 

It will be appreciated by a person skilled in the art 
that alternative embodiments to that described above are 
possible. For example, the control register series 140 
and counter 150 may be augmented by multiple counters and 
30 control register series, coupled with qualifiers such as 
instruction type or register number. Also the counting 



WO 2004/038598 



PCT/EP2003/011176 



- 8 - 

sequence need not repeat in a cyclical fashion, and it is 
possible to load the counter (s) with specific sequence 
start points by adding just one further instruction. All 
of these features may be used to add complexity to the 
5 sequence of permutations and so further increase the 
flexibility of the architecture. 

Furthermore the number and size of vector registers may 
differ from those described above, it being understood 
10 that the number of vector registers required by the 

present invention will be less than that required for an 
equivalent prior art arrangement. 
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Claims 

1. An arrangement for vector permutation in a single- 
instruction multiple-data microprocessor, the arrangement 

5 comprising: 

a permutation logic block coupled to receive and 
permutate vectors from at least one vector register 
according to control parameters; 

a plurality of control registers, each coupled to 
10 selectively provide control parameters to the permutation 
logic block; and, 

control means coupled between the plurality of control 
registers and the permutation logic block and arranged 
for selecting one of the plurality of control registers 
15 and for providing the control parameters from the 

selected one of the plurality of control registers to the 
permutation logic block. 

2. A single-instruction multiple-data microprocessor 
20 vector permutation system comprising: 

at least one vector register; 

a permutation logic block coupled to receive and 
permutate vectors from the at least one vector register 
according to control parameters; 
25 a plurality of control registers, each coupled to 

selectively provide control parameters to the permutation 
logic block; and, 

control means coupled between the plurality of control 
registers and the permutation logic block and arranged 
30 for selecting one of the plurality of control registers 
and for providing the control parameters from the 
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selected one of the plurality of control registers to the 
permutation logic block. 

3. The arrangement of claim 1 or system of claim 2 

5 further comprising a negate block coupled to the control 
means and coupled to receive and selectively negate 
vectors from the permutation logic block according to the 
control parameters received from the control means, 
wherein the control parameters include permutation 
10 parameters and negate parameters. 

4 . The arrangement or system of any preceding claim 
wherein the control means includes at least one counter 
arranged to provide a sequential order for selecting one 

15 of the plurality of control registers. 

5. A method for vector permutation in a single- 
instruction multiple-data microprocessor, the method 
comprising the steps of: 

20 providing vectors to be permutated; 

selecting one of a plurality of control registers, each 
control register containing parameters for determining 
permutation characteristics ; 

permutating the vectors according to the parameters of 
25 the selected control register. 

6. The method of claim 5 wherein the control register 
parameters are also used for determining negate 
characteristics and the step of permutating further 
includes the step of selectively negating the vectors 
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according to the parameters of the selected control 
register. . 

7. The method of claim 5 or claim 6 wherein the step of 
5 selecting further includes the following of a sequential 
order of the plurality of control registers. 



8. The arrangement or system of claim 4, or method of 
claim 7, wherein the sequential order includes automatic 

10 sequencing through a set of fixed control parameters. 

9. The arrangement or system of claim 4, or method of 
claim 7, wherein the sequential order includes automatic 
sequencing through a set of programmable control 

15 parameters. 



10. The arrangement, system or method of claims 4, 7, 8 
or 9 wherein the sequential order is cyclical. 

20 11. An arrangement for vector permutation in single- 
instruction multiple-data microprocessors substantially 
as hereinbefore described with reference to FIG. 2 of the 
accompanying drawings . 

25 12. A system for vector permutation in single- 
instruction multiple- data microprocessors substantially 
as hereinbefore described with reference to FIG. 2 of the 
accompanying drawings . 

30 13. A method for vector permutation in single- 
instruction multiple-data microprocessors substantially 
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as hereinbefore described with reference to FIG. 2 of the 
accompanying drawings . 
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